Tree edit distance with gaps

نویسنده

  • Hélène Touzet
چکیده

The purpose of this paper is to study the definition of edit distances with convex gap weights for trees. In the special case of strings, this problem has yield to the definition of classical solutions: Galil and Giancarlo produced in [2] an algorithm in O(n log(n)), for example. For trees, standart edit distance algorithms – [7] or more recently [4] with a O(n log(n))) solution – are concerned with linear gap weights induced by pointwise edit operations: inserting or removing one single node (or one single edge) at each step. These algorithms may be adapted to deal with affine gap weights, with open gap penalties and extension gap penalties. However, as far as we know, there is no tentative to extend thoses results to tree edit distances with arbitrary gap weights. The major motivation for this work comes from computational biology, with comparison of RNA molecules. RNA secondary structures without tertiary interactions, such as pseudoknots or base triples, may be canonically encoded by trees. See [6] for details. So comparing RNA structures amounts to computing edit distances between trees. It is a well-admitted fact that the insertion, or deletion, of a set of contiguous nucleotides can be assumed to result from a single mutationnal event. So it makes no sense to assign linear weight functions, as existing methods use to do. Convex gap weight functions are much more sensitive in this context. In the paper, we first prove that there exists no polynomial algorithm for the problem with convex gap weights, unless P = NP. In the second part, we consider one restriction of the definition of gaps to complete subtrees, and we get a quadratic algorithm for the associated tree edit distance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tree Edit Distance Cannot be Computed in Strongly Subcubic Time (unless APSP can)

The edit distance between two rooted ordered trees with n nodes labeled from an alphabet Σ is the minimum cost of transforming one tree into the other by a sequence of elementary operations consisting of deleting and relabeling existing nodes, as well as inserting new nodes. Tree edit distance is a well known generalization of string edit distance. The fastest known algorithm for tree edit dist...

متن کامل

A New Dissimilarity Measure Between Trees by Decomposition of Unit-Cost Edit Distance

Tree edit distance is a conventional dissimilarity measure between labeled trees. However, tree edit distance including unit-cost edit distance contains the similarity of label and that of tree structure simultaneously. Therefore, even if the label similarity between two trees that share many nodes with the same label is high, the high label similarity is hard to be recognized from their tree e...

متن کامل

Computing approximate tree edit distance using relaxation labeling

This paper presents a new method for computing the tree edit distance problem with uniform edit cost. We commence by showing that any tree obtained with a sequence of cut operations is a subtree of the transitive closure of the original tree, we show that the necessary condition for any subtree to be a solution can be reduced to a clique problem in a derived structure. Using this idea we transf...

متن کامل

Determining Image Similarity from Pattern Matching of Abstract Syntax Trees of Tree Picture Grammars

This paper studies the use of tree edit distance for pattern matching of abstract syntax trees of images generated with tree picture grammars. This was done with a view to measuring its effectiveness in determining image similarity, when compared to current state of the art similarity measures used in Content Based Image Retrieval (CBIR). Eight computer based similarity measures were selected f...

متن کامل

Tree Edit Distance Problems: Algorithms and Applications to Bioinformatics

Tree structured data often appear in bioinformatics. For example, glycans, RNA secondary structures and phylogenetic trees usually have tree structures. Comparison of trees is one of fundamental tasks in analysis of these data. Various distance measures have been proposed and utilized for comparison of trees, among which extensive studies have been done on tree edit distance. In this paper, we ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Process. Lett.

دوره 85  شماره 

صفحات  -

تاریخ انتشار 2003